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DETAILED ACTION 



Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

2. Claims 1 , 5 to 7, 9, 13 to 16, 20, 22, and 26 to 31 are rejected under 35 
U.S.C. 102(a) as being anticipated by Thrift et ai 

Regarding independent claims 1, 15, 16, and 30, Thrift etai discloses a method, 
system, and computer-readable medium, comprising: 

"receiving a speech signal locally from a user via a client device" - microphone 
10b receives voice input from a user; voice activated control unit 10 ("a client device") 
has microphone 10b (column 2, lines 59 to 62: Figure 1); 

"performing speech recognition on said speech signal in accordance with an 
embedded speech recognizer of said client device to produce a recognizable text 
signal, wherein said embedded speech recognizer employs a language model" - in one 
embodiment, control unit 10 performs all of the voice recognition process and delivers 
speech data to host computer 1 1 via transmitter 10g (column 3, lines 1 to 3: Figure 1); if 
control unit 10 performs all voice recognition processes, memory 10f stores these 
processes (as a voice recognizer) as well as grammar files (column 3, lines 22 to 45: 
Figure 1); broadly, grammar files are "a language model"; implicitly, speech data is in 
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the form of "a recognizable text signal" because speech recognition generates text from 
speech; 

"adapting said performance of speech recognition based on at least one local 
parameter" - memory 10f stores a grammar file generator for dynamically generating a 
grammar (column 3, lines 41 to 45: Figure 1); grammars for speakable links may be 
dynamically created so that only the grammar for a current display is active and is 
updated when a current display is generated; dynamic grammar creation reduces the 
amount of required memory 1 0f; dynamic grammar files are created from current Web 
pages; every time the screen 40 changes, the user agent 64 creates a grammar 
containing the currently visible links (column 5, line 48 to column 6, line 25: Figure 5); 
dynamic updating of grammar files every time a screen changes is equivalent to 
"adapting said performance of speech recognition", where changing of a screen is "at 
least one local parameter"; 

"forwarding said recognizable text signal to a remote server" - the output of the 
voice recognizer is speech data; the speech data is transmitted to host system 1 1 ("a 
remote server"), which performs voice control interpretation processes (column 3, lines 
45 to 56: Figure 1). 

Regarding independent claims 9, 22, and 31, Thrift et ai discloses a method, 
server, and computer-readable medium, comprising: 

"receiving a recognizable text signal representative of a user speech signal from 
a client device, wherein said recognizable text is generated using a speech recognizer 
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having a language model on said client device" - microphone 10b receives voice input 
from a user; voice activated control unit 10 ("a client device") has microphone 10b 
(column 2, lines 59 to 62: Figure 1); in one embodiment, control unit 10 performs all of 
the voice recognition process and delivers speech data to host computer 1 1 via 
transmitter 10g (column 3, lines 1 to 3: Figure 1); if control unit 10 performs all voice 
recognition processes, memory 1 0f stores these processes (as a voice recognizer) as 
well as grammar files (column 3, lines 22 to 45: Figure 1); broadly, grammar files are "a 
language model"; implicitly, speech data is in the form of "a recognizable text signal" 
because speech recognition generates text from speech; 

"wherein said recognizable text is generated in accordance with adapting said 
performance of speech recognition based on at least one local parameter" - memory 
1 0f stores a grammar file generator for dynamically generating a grammar (column 3, 
lines 41 to 45: Figure 1); grammars for speakable links may be dynamically created so 
that only the grammar for a current display is active and is updated when a current 
display is generated; dynamic grammar creation reduces the amount of required 
memory 10f; dynamic grammar files are created from current Web pages; every time 
the screen 40 changes, the user agent 64 creates a grammar containing the currently 
visible links (column 5, line 48 to column 6, line 25: Figure 5); dynamic updating of 
grammar files every time a screen changes is equivalent to "adapting said performance 
of speech recognition", where changing of a screen is "at least one local parameter"; 

"processing said recognizable text signal in accordance with a task model" - the 
output of the voice recognizer is speech data; the speech data is transmitted to host 
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system 11 ("a remote server"), which performs voice control interpretation processes; 
examples of voice control interpretation are web browsing and commands to a 
television (column 3, lines 45 to 65: Figure 1 ); web browsing and commands to a 
television are examples of "a task model". 

Regarding claims 5, 1 3, 20, and 26, Thrift et al. discloses host 1 1 ("said remote 
server") could dynamically generate the grammar and download the grammar file to 
control unit 1 0 (column 3, lines 41 to 45: Figure 1 ); a grammar file is downloaded in 
response to speech data ("said recognizable text signal") requesting a new web page 
(column 5, line 48 to column 6, line 13: Figure 5). 

Regarding claim 6, Thrift et a/, discloses the output of the voice recognizer is 
speech data; the speech data is transmitted to host system 1 1 ("a remote server"), 
which performs voice control interpretation processes; examples of voice control 
interpretation processes are web browsing and commands to a television (column 3, 
lines 45 to 65: Figure 1); web browsing and commands to a television are examples of 
"a task model". 

Regarding claims 7, 14, and 28, Thrift et al. discloses examples of voice control 
interpretation are web browsing and commands to a television; host system 1 1 ("a 
remote server") may respond to voice input to control unit 10 by executing a command 
or providing a hypermedia (Web) link (column 3, lines 45 to 65: Figure 1); thus, host 
system 1 1 must monitor "progress toward satisfying a goal of said user" to display a 
television schedule or browse the web. 



Application/Control Number: 10/033,772 Page 6 

Art Unit: 2654 

Regarding claim 27, Thrift et al. discloses host 1 1 ("said remote server") could 
dynamically generate the grammar and download the grammar file to control unit 10 
(column 3, lines 41 to 45: Figure 1); a grammar file is downloaded is response to 
speech data ("said recognizable text signal") requesting a new web page (column 5, line 
48 to column 6, line 13: Figure 5); implicitly, something that forwards grammar file 
updates from a host system 11 to a control unit 10 is "a grammar manager". 

Regarding claim 29, Thrift et al. discloses host system 1 1 provides voice control 
interpretation processes for dialogs via speakable hotlist processes (column 4, line 33 to 
column 5, line 19: Figure 3); an interpretation process for determining which processes 
are hotlist processes is equivalent to "a dialog manager". 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 2 to 4, 10 to 12, 17 to 19, and 23 to 25 are rejected under 35 U.S.C. 
103(a) as being unpatentable over Thrift et al. in view of Balakrishnan et al. 

Thrift et al. updates a grammar file ("adapting said performance of speech 
recognition") based upon a currently displayed web page of a speakable command list 
("based on at least one local parameter"), but omits adapting performance of speech 
recognition based on a parameter representative of environmental noise, acoustic 
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environment, and pronunciation of a user. However, it is well known that speech 
recognition systems can be trained to improve performance with respect to individual 
user pronunciations and environmental noise. Balakrishnan et ai teaches context 
dependent phoneme networks that are specific to a user and an environment. (Column 
2, Lines 10 to 49) In operation, a first part of an operating system 44 generates a CD 
phoneme network in order to capture user and environment specific acoustic models, 
which are continually adapting to the user's voice, environment, and use of language. 
The second part 50 of the operating system 44 then uses appropriate search engine 
applets 51 to retrieve a CD network. (Column 4, Line 66 to Column 5, Line 56) 
Implicitly, an environment for speech recognition is inclusive of environmental noise. 
The objective is to eliminate obstacles to computer speech recognition by not requiring 
that each application will have to keep separate acoustic models for each 
user/environment and so that performance is not sacrificed. (Column 1 , Lines 24 to 55) 
It would have been obvious to one having ordinary skill in the art to adapt performance 
of speech recognition based on parameters representative of environmental noise, 
acoustic environment, and pronunciation of a user as taught by Balakrishnan et ai in 
the wireless voice-activated device for control of a processor-based host system of 
Thrift et ai for the purpose of eliminating obstacles to speech recognition by not 
requiring that each application have separate acoustic models for each 
user/environment. 
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5. Claims 8 and 21 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Thrift et a/, in view of Ramaswamy et al. 

Thrift et al. discloses grammar files ("said language model") are stored in 
memory 10f of control unit 10 ("said client device"), but does not specifically say that 
grammar files are stored in a cache. However, it is well known that files currently being 
used by a computer system are commonly stored in cache to reduce memory access 
operations. Thus, it is likely implicit that memory 10f includes a cache, and grammar 
files are stored in cache memory for Thrift et al. Ramaswamy et al. teaches an 
analogous art speaker verification method and system, where speech recognition * 
engines use a language model. When more than one language model is used, some of 
the models may be personalized to a given user, and stored in a personal cache, built 
using words and phrases spoken frequently by a given user. (Column 5, Lines 22 to 27) 
It would have been obvious to one having ordinary skill in the art to store dynamically 
updated grammar files of control unit 10 from Thrift et al. in a cache memory as 
suggested by Ramaswamy et a/, for the purpose of reducing memory access operations 
for words and phrases spoken frequently by a given user. 
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Conclusion 

6. The prior art made of record and not relied upon is considered pertinent to 
Applicants' disclosure. 

Hemphill, Sharma et al., Dragosh et al., Besling et al., Jacobs et al., Chung et al., 
and Kuhn et al. disclose related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (703) 308- 
9064. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone 
number for the organization where this application or proceeding is assigned is 703- 
872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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